Bayesian analyses of direct radiocarbon dates reveal geographic variations in the rate of rice farming dispersal in prehistoric Japan

The adoption of rice farming during the first millennium BC was a turning point in Japanese prehistory, defining the subsequent cultural, linguistic, and genetic variation in the archipelago. Here, we use a suite of novel Bayesian techniques to estimate the regional rates of dispersal and arrival time of rice farming using radiocarbon dates on charred rice remains. Our results indicate substantial variations in the rate of dispersal of rice within the Japanese islands, hinting at the presence of a mixture of demic and cultural diffusion, geographic variations in the suitability of its cultivation, and the possible role of existing social networks in facilitating or hindering the adoption of the new subsistence economy.

Supplementary Text S1

Handling the Impact of Radiocarbon Measurement Error and Calibration Plateau
Estimates of farming dispersal rates are typically obtained by regressing the distance of focal sites from a putative point of origin against the mean or the median calibrated radiocarbon date of the earliest evidence of farming on those sites (6,9,10,38,42). The rate is then computed by taking the negative (in case time is represented by BP and positive values) reciprocal of the slope coefficient (but see (9) for fitting time against distance instead). Confidence intervals of the derived rate estimates do not, however, account for the impact of the measurement error of the radiocarbon dates, nor the fact that these errors are non-random due to calibration effects. Figure S1 illustrates the implication of this issue, showing a hypothetical dispersal of 2.85 km/year with four observation points in calendar time (black points) and their corresponding median calibrated dates (red points). The latter converge to nearly identical values due to the effect of the calibration plateau, leading to a slope approximately equal to 0, which in turn corresponds to an estimated instantaneous dispersal.
A possible solution, first introduced by Steele (65), and later adopted in other studies (e.g. (6,43)), consists of fitting the regression model to iteratively sampled calendar dates from each calibrated distribution and obtaining multiple estimates from which confidence intervals are derived. While this solution accounts for the uncertainty of each measurement, it holds a conservative stance where effectively each point is treated separately, without acknowledging that these uncertainties are correlated to each other. Furthermore, these approaches often consist of aggregating the estimates of individual iterations but not their associated errors (but see (43)); hence the resulting confidence intervals could potentially fail to properly account for the uncertainty derived from sampling error.

Bayesian Quantile Regression
An alternative approach is to fit a hierarchical measurement error model, following the same principles adopted in Bayesian phase models (66), and more recently in radiocarbon-based demographic inference (64). In this case, the regression model can be conceptualised as follow: where i is the true calendar date of the date at each site focal site i, d i is the distance of the site from the putative source of dispersal, f ( i ) is its corresponding 14 C age on the IntCal20 calibration curve (59), and X i is the observed conventional radiocarbon age of the sample, and σ i is the square root of the sum of the squares of the samples' 14 C age error and the error on the calibration curve. Equation [3] effectively handles the measurement error of and establishes its relationship to the observed 14 C age and its associated error. In practice, equations [1] and [2] can be replaced by any model and following recent works advocating the use of quantile regression (38,39,43) we use an asymmetric Laplace likelihood (60), so that is defined as follows: where i is the same linear model as equation [2] above, and λ is the scale parameter, and the is the user-defined quantile of interest. Because we are interested in the distribution of the earliest dates, previous applications of quantile regression for dispersal processes employed extreme values such as 90th ( =0.9) and 99th percentiles ( =0.99).  Figure S2 shows the result of the quantile regression model described above fitted to the earliest charred rice dates from each archaeological site, using as the putative source of dispersal the earliest date in our sample (Ukikunden Shell Midden in Northern Kyushu; 130.0087, 33.411, Fig.1 main text). We fitted our model using the following priors: 0~N ormal(3000, 200) [5] 1~E xponential(1) [6] λ~Exponential(0.01) [7] setting equal to 0.9.
Posterior estimates of the model parameters were obtained using Markov Chain Monte-Carlo (MCMC) sampling via the nimble (61) and nimbleCarbon (63,64) packages in R (67). We used four chains, each with 6 million iterations (half dedicated to burn-in) and parameters sampled every 300 steps to reduce file size. We used Metropolis-Hasting adaptive random walk sampler for all parameters using default settings, with the exception of the age estimates , where we used an adaptInterval of 20,000 and an adaptFactorExponent to 0.1 which yielded better mixing in test runs. Convergence was evaluated using the Gelman-Rubin diagnostic and visual inspection of the trace plots. Results showed good mixing with the highest R-hat equal to 1.015..  figure S1, with the median date model (in blue) showing a flatter slope and, consequently, a much higher expected rate of dispersal compared to the Bayesian model (in red). This discrepancy is due to the impact of the so-called 'Hallstatt plateau' in the calibration curve and is made evident by the larger discrepancies between the median calibrated dates and the median posterior values of . The discrepancy is almost absent for periods where the impact of the plateau is small and the model overall provides little additional information concerning individual dates. However, dates in the plateau region, which typically show wider and flatter posterior calibrated distributions, are largely "informed" by the overall model, with a resulting shrinking of the measurement error, and in this case, an offset of the resulting median date. While it is worth noting that this model does not account for regional variation in the dispersal rates, the reciprocal of the posterior of the parameter 1 ( figure S3) provides an estimate of the average dispersal rate of charred rice farming between 1.59 and 2.51 km/year.

Bayesian Gaussian Process Quantile Regression
The linear model formalised in equation [2] assumes that 1/ 1 is constant across the geographic window of analyses and hence not able to identify local episodes of acceleration and slow-down in the dispersal process. A slight modification of the linear model can account for such local processes: Here the additional term s i allows for a site-level variation in the dispersal rate. This local deviation parameter can be assumed to be a random effect within a hierarchical setting, but with the important additional assumption that s is spatially autocorrelated. A Gaussian process model (69) can satisfy these two assumptions so that the parameter is modelled as a multivariate normal distribution: with a mean of 0 and with a covariance matrix defined by a quadratic exponential model (see also figure S4): i,j = η 2 exp(-0.5(d i,j /ρ) 2 ) + I i,j ε 2 [9] where the covariance i,j between sites i and j is portrayed as an exponential decay based on their inter-distance d i,j with length scale parameter ρ and maximum covariance equal to the square of the marginal deviation parameter η. The term I i,j ε 2 provides an additional covariance ε 2 in case the i and j are identical.   It is worth mentioning here that the local deviation parameter s is based on the slope and hence should not be directly interpreted in terms of dispersal rate. For example with 1 = -0.5 and s=0, the dispersal rate is equal to 2 km/year (i.e. -1/(0-0.5)), with s = 0.2 the rate increases to ca. 3.33km/year (i.e an increase of 1.33 km/year), whilst s = -0.2 decreases the rate to 1.42 km/year (i.e. a decrease of 0.58 km/year). In other words, the impact of s is symmetric towards the slope, but not towards the rate. This is due to the very nature of the reciprocal of slopes which accelerates quickly towards 0 with a limit of infinite velocity when the slope is equal to 0 (see figure S6).

Priors
Priors for the Gaussian Process Quantile Regression (hereafter GPQR) model were weakly informed by independent lines of archaeological evidence. As for the non-spatial quantile regression model described above, the intercept was based on a normal distribution with a mean of 3,000 and a standard deviation of 200 (see equation [5]) which broadly assumes the arrival of rice to be somewhere between the second half of the 2nd millennium BC and the first half of the 1st millennium BC. Sensible priors for the slope parameter can be based on known archaeological examples of farming dispersal rates, for our case this is equivalent to a reciprocal of s -1 . The variance of the local dispersal rate s is equal to η 2 in the case of a complete lack of spatial autocorrelation. This can be used as the basis for building a prior predictive check model to establish which prior combination yields sensible dispersal rates. Figure S7 shows the result when an exponential prior with a rate of 1 is applied for 1 (cf. equation [6]) and an exponential prior with a rate of 20 is used for η 2 . The results have a wide range of dispersal rates that include observed values from other archaeological contexts.
Given the premise and assumptions of the spread of rice farming in Japan, the parameter was also censored to ensure that the dispersal rate is always positive.
The prior for the length scale parameter ρ was modelled using a gamma distribution modelled with a shape of 10 and a rate of 0.06 and truncated between 1 and 1350 km (the smallest and largest inter-distance between sampled sites). The gamma distribution enforces positive values, with a fairly long tail and, in this case, a mode around 200km. These settings ensure a fairly wide range of scales for regional patterns of local dispersal rates (Fig. S8).

Tactical Simulation
To establish whether the Gaussian Process model described above can recover local variations in dispersal rates, we generated a simulated set of n=150 radiocarbon dates from an equivalent number of sites within our window of analysis. Calendar dates were sampled using a normal distribution (see equations [1], [7][8][9] above) using the following parameter values: ε = 100 ; 0 = 3000; 1 = 0.6; ρ = 150; and η 2 = 0.08. Dates were back-calibrated in 14 C age with an error of 20 years.
We fitted our dataset using our Gaussian Process model using the same prior as in our observed dataset and setting to 0.9. Model fitting was performed using four chains with 1 million iterations each (half discarded as burn-in, and sampled every 50 steps). MCMC diagnostics showed good convergence, with the highest R-hat (70) equal to 1.003.
Our results indicate that the model does capture the spatial variation in dispersal rates (figure S9), with most of the random effect parameters (figure S10) and all the key fixed effect parameters (figure S11) recovering their respective true values within their 95% quantile intervals.

MCMC Settings, Diagnostics, and Posteriors
As for the non-spatial quantile regression, the model fitting was carried out using the nimble and the nimbleCarbon R packages. We ran four chains, each with 2 million iterations, half of which were used for burn-in, and values sampled every 100 steps. We used a Metropolis-Hasting adaptive random walk sampler for all parameters except for 1 and s i in the GPQR model, which were inferred using an automated factor slice sampler (71) to account for correlation between the two parameters. As for the non-spatial quantile regression, the random walker sampler for was modified from the default settings (see section 1 above).
The Gelman-Rubin convergence statistics (70) were below 1.01 for all parameters, and the trace plots of the four key fixed parameters showed good mixing of the chains ( fig. S12 & S13). Figures S14 and S15 show the marginal distribution of these four parameters, whilst tables S1 and S2 provide summary statistics and diagnostics of all fixed parameters.

Bayesian Hierarchical Phase Models with Constraints
Bayesian phase models of radiocarbon dates typically used in stratigraphic contexts have been occasionally applied to study regional phenomena, such as the dating of cultural phases (72,73) or the arrival date of populations (40) or cultural traits (74). These models have been recently used to estimate the arrival dates of crops in different geographic regions (15,75), typically employing uniform distribution models as follows: i,k~U niform(ν k , υ k ) [10] where i,k is the calendar date of sample i from Area k, and ν k and υ k are the start and end dates of the Area k. The relationship between i,k and the radiocarbon sample is established using the same procedure described above in the model [2].
The parameter of interest, in this case, is ν k which provides an estimate of the arrival time of the crop in the focal Area k. While there are some interesting non-Bayesian alternatives to this problem (76), the advantage of this approach is the possibility to add constraints and priors in multi-regional settings (see below).
A key assumption of this approach, however, is that in order to have unbiased estimates of ν k and υ k samples must be randomly and independently selected. For many real statistical applications, this assumption is never truly met, but in the case of crop dispersal, a particular problem arises from the fact that charred remains are often found in distinctive recovery contexts or sites, and as such, samples cannot be regarded as truly independent. A trivial example is to contrast a situation where the inference is carried out on 20 samples from 20 different sites against a situation where the same number of samples are recovered from a single site. In the former case, estimates of ν k and υ k are more representative of the start and end dates of a region, but in the latter case, these would represent the start and end date of the particular site from which the samples are taken. While such extreme situations are rare, multiple samples are frequently taken from the same site or context and as such, might represent a substantial portion of the dates from a particular region. For example, Leipe and colleagues (15) estimated the arrival date of rice farming in the Japanese central highlands using 32 dates, but ca. 40% of them (n=13) were from a single site (Chikarishijori).
One way to approach this problem and to take into account the fact that samples are typically recovered from the same site is to employ a hierarchical model: i,j,k~U niform( j,k , j,k + δ j,k ) [11] j,k~U niform(ν k , υ k ) [12] where j,k is the start date of crop use at site j in region k, and δ j,k is the duration of rice farming in the same site. Inference of the regional start date is then no longer based on the individual charred dates i,j,k but instead from the start date of individual sites. This effectively acknowledges sample independence; sites with a larger number of dates will still provide more information, but mainly in terms of having more accurate estimates of j,k . From an interpretative point, [12] is slightly different from [10] as the former represents the distribution of start dates at each site while the latter represents just the distribution of dates within the area k. However, the interpretation of our primary parameter of interest (i.e. ν k ) as the arrival time of the crop in the practical area remains the same. Model [11] is parameterised in a way so that the end date is effectively defined by the duration parameter δ j,k . Inference of this parameter is actually helped by the presence of sites yielding multiple dates and by the choice of some weakly informative prior that conveys assumptions on site duration. Of course, if the entirety of the dataset consists of one date per site, then there is effectively no way to infer δ j,k , but under those circumstances, arrival dates can be safely estimated using model [10].

Tactical Simulation
To illustrate the difference between the standard model [10] and the proposed hierarchical modification depicted in [11] and [12], we simulated an artificial archaeological dataset consisting of 30 radiocarbon dates distributed exponentially across 10 archaeological sites within a single region (n 1 = 13, n 2 = 3, n 3 = 5, n 4 = 3, n 5 = n 6 = n 7 = n 8 = n 9 = n 10 = 1). Start dates of occupation (i.e. j ) were modelled using a uniform distribution with regional start and end dates ν and υ dates equal to 3500 and 3000 cal BP. The site duration parameter δ j was randomly sampled from a gamma distribution with shape parameter 5 and rate parameter 0.02. The simulated calendar dates were back-calibrated in 14 C age and assigned to an error of 20 years.
We estimated ν and υ using both the conventional non-hierarchical approach (model [10]) and the proposed solution (model [11][12]) using three chains and 2 million iterations (half discarded for burnin and sampled every 100 steps). For model [10] we used a uniform prior for ν and υ between 0 and 10000, with the constraint ν > υ (i.e. ν earlier than υ). For model [11][12] we additionally modelled δ j using a uniform hyper-prior for the rate (bounded between 1 and 20) and a normal distribution with a mean 200 and standard deviation 100 truncated between 1 and 500 for the mode, where the rate is equal to (shape-1)/mode. MCMC diagnostic showed good convergence for all parameters (Rhat < 1.01). Figure S16 shows the difference in the marginal posterior distribution of ν and υ in the two models. The conventional model has a narrower posterior given the inference is made on 30 samples, whilst the hierarchical model has a wider posterior as the inference is primarily driven on the sites. However, the posterior of the conventional model is less accurate and fails to capture the true start date (dashed line), which is instead recovered by the hierarchical model.

Implementation details for the observed data
The hierarchical phase model described in section 3.1 was fitted to the charred rice dates from the eight geographic areas (table S3), whose boundaries were informed by previous archaeological studies and the results of our GPQR analysis, to ensure that dispersal rates within each area were as consistent as possible. We particularly made an effort to avoid having instances where a major change in dispersal rate occurred within each area so that an early estimate of arrival date was the consequence of dates occurring at sites located at the boundary of adjacent areas. Priors of the start and end dates ν and υ were based on a uniform distribution bounded between 500 and 5000 cal BP, with the constraint ν > υ. The duration parameter δ was modelled using a gamma distribution, with the shape and rate parameters modelled using a uniform hyper-prior for the rate (bounded between 1 and 20) and a normal distribution with a mean 200 and standard deviation 100 truncated between 1 and 500 for the mode, where the rate is equal to (shape-1)/mode. Figure S17 shows the prior predictive check with these settings and illustrates how our model includes a wide range of possible settlement durations. We considered two different versions of the hierarchical model. In model a, we infer all parameters without any constraints, hence allowing for all possible temporal orders of our primary parameter of interest ν. This allows us to infer the arrival date of rice in each area on the exclusive basis of the dates from that focal area. In model b, we add the following constraints: ν I > ν kII , ν I > ν III > ν IV > ν V , ν IV > ν VI , ν IV > ν VII , and ν IV > ν VIII . This effectively incorporates the assumption of a wave-of-advance-like model originating from northern Kyushu (Area I), whilst allowing for possible leap-frogging after the dispersal reached the Kansai area (Area IV).
Both models were fitted using 6 million iterations, half of which were dedicated to burn-in, and sampled every 300 steps. We used Metropolis-Hasting adaptive random walk sampler for all parameters using default settings, with the exception of the age estimates , where we used an adaptInterval of 20,000 and an adaptFactorExponent to 0.1. Model convergence was assessed via visual inspections of the trace plots and Gelman-Rubin convergence statistics. All parameters yielded an R-hat below 1.01, with the exception of one 166 which returned an Rhat of 1.03 in model b. Visual inspection of the trace plot indicated that this was most likely due to the multi-modal shape of the posterior. The high agreement index (98.4, (77)), as well as comparison with the associated calibrated date, suggests that there are no concerns from an inferential point of view. Table S4 shows the summary statistics of the arrival date estimates ν for the two models, whilst figures S18 and S19 show their marginal posterior distribution. Figure S20 indicates the probability that ν values of row areas are smaller (i.e. later) than column areas, with cells coloured in blue supporting a wave-of-advance scenario, while those coloured in red indicate possible leap-frog dispersals. Finally, figure S21 shows the pairwise difference in ν (column area minus row area) obtained from the posterior samples, with portions of the 90% highest posterior density interval coloured in blue for positive values (column earlier than row) and in red for negative values (row earlier than column).  Table S4. Summary statistics and diagnostics for the ν parameters in the two models.